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Abstract. This study investigated what log files can reveal about learner behaviour 
of low- and non-literate adults learning to read for the first time in Finnish as a 
second language. The participants’ reading development was supported by 
practising in an online training environment. Log files, automatically created user- 
computer interaction records, were chosen as empirical evidence as their analysis 
enables in-depth post-activity exploration of student behaviour. The quantitative 
analysis resulted in user profiles containing information on learner engagement, 
performance and productivity. Overall, the results demonstrate that individual 
learning performance, process, and progress can be studied and reflected on 
holistically by investigating the individual’s digital learning footprints, their log 
files. Log files are an accurate and precise, yet currently underemployed research 
tool. More easy-to-use tools for non-experts are in demand, as current Data Mining 
(DM) tools are designed for computer scientists and need to be developed further 
to become accessible and applicable by practitioners and educational researchers. 


Keywords: log files, learner behaviour, late literacy, computer-assisted language 
learning. 

1. Introduction 

A growing number of low- and non-literate adults are immigrating to highly literate 


countries. According to Cucchiarini, van de Craats, Deutekom, and Strik (2013, 
p. 96), 10-15% of the European immigrant population is estimated to be non- or 
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low-literate. In Finland, the world’s most literate country (Miller & McKenna, 
2016), adult non-literacy is highly unusual and consequently, there is a paucity 
of academic research on how non-literate adults acquire skills in Finnish as their 
second language (for a review see Malessa, 2018). In practice, basic language 
courses are often insufficient to achieve functional literacy, even in the very 
transparent Finnish orthography (see Tammelin-Laine & Martin, 2015). 


The ‘Digital Literacy Instructor’ (DigLin) was pioneered to provide Computer- 
Assisted Language Learning (CALL) support to low- or non-literate migrants 
learning to read in English, Finnish, German, or Dutch. The project (2013-2015), 
funded by the European Commission, enabled the development of the DigLin 
software which provided systematic instruction in sound-letter connections, basic 
decoding, and word recognition. The DigLin training environment included 300 
words with seven different exercise types, including ‘Listen and drag the letters’ 
(DL) and ‘Listen and drag the words’ (DW), see Figure | below. 


Figure 1. Screenshot of a DL (left) and DW (right) task 


Oo (kana a sabes laake eecece| jooga 9 oO: 
seen ine maali ececece| myyji 9 GO: 
oo oO : : alo 3] eecee|tyyny 9 ef 
a a = fetoia e000 _saari oO: 
te 7 Digg alc ececee| kaali_ © oO: 
nae Guu ececee| sari % Gi: 
Go PT 0000 poolo % BG: 
ett = eccecee| hiiri ¢ GB: 
eecee ecc5ee BG: 
et a ec5ccee = o: 
‘on 


sta arto alfolfe|falfe|[rlfa]fm]fr]fa]fkl[e][m]fa 


plfa|fe]fs]/t]/ulfv][w][x][y][z][a][ 6 |[ng][nk 


The aim of the current mixed-method study was to analyse the learners’ log files 
automatically generated during the use of the software and to explore what log files 
can reveal about learner behaviour. 


2. Method 


2.1. Tracking computer-user interaction with log files 


This study’s seven participants tested the DigLin software for a period of four 
to six months (for details see Malessa & Filimban, 2017). The learners’ software 
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use, including mouse/keyboard movements and microphone recordings, was 
automatically tracked by log files. Time-stamped log files provide detailed and 
objective tracking data that can be employed to make inferences about learner 
knowledge, processes, and strategies (Chapelle, 2007, pp. 98-99). They are 
currently underemployed mainly due to the extensiveness of their collected 
interaction records (Bruckman, 2006, p. 1449), illustrated by Figure 2. 


Figure 2. Extract of a DigLin log file 


7632;"[""O4FIN"'314""]";"FIN";"2014-10-30 09:21:20";"2014-10-30 09:23:58";"Drag the letters 
4a";"[{""type"":""play_word_sound""  data"":""sauna"™" timestamp"":""2014-10-30 
09:21:20"" data_extra"™:"""} {'"type"":""hide_word_picture"™" datas" 
timestamp"":""2014-10-30 09:21:21"" data_extra"":""""} 
{""type"":""show_word_picture"" data"":""sauna™" timestamp"":""2014-10-30 
09:21:21" data_extra™":""""} {""type"":""letter_drag"" data"":""s"™ 
timestamp"":""2014-10-30 09:21:26"" data_extra"":""""} 
{""type"":""letter_drag_right"" data"":"""" timestamp"":""2014-10-30 09:21:28" 
data_extra"";""""} {""type"":""letter_drag™" data"™";""a"" 
timestamp"":""2014-10-30 09:21:32"" data_extra"™':""""} 
{""type"":""letter_drag_right"" data"":"""" timestamp"":""2014-10-30 09:21:34" 
data_extra"":""""} {""type"":""letter_drag"" data"":""u" 
timestamp"":""2014-10-30 09:21:34"" data_extra"":""""} 
{""type"":""letter_drag_right"" data"’:"""" timestamp"":""2014-10-30 09:21:42"" 
data_extra"":""""} {""type"":""letter_drag"" data"":"n"™" 
timestamp"":""2014-10-30 09:21:44""  data_extra"":""""} 
{""type"":""letter_drag_right"™" data"":"""" timestamp"":""2014-10-30 09:21:45" 
data_extra"™":""""} {""type"":""letter_drag"" data"":""a"™ 
timestamp"":""2014-10-30 09:21:49""  data_extra"":"} 
{""type"":""letter_drag_right"™" data"":"""" timestamp"":""2014-10-30 09:21:49" 


Figure 2 presents the user interaction in a DL exercise. The documentation 
(workload) of dragging the letters for the word sauna (09:21:21-09:21:49 
=29 seconds) emphasises the comprehensiveness of log file raw data. The workload 
contains details regarding the exact start/end date of the event, exercise type, type 
of actions taken, data involved, and provided feedback (right/false). 


2.2. Raw data preparation and DM procedures 


DigLin’s log file dataset provided the empirical data for this study. Log files were 
accessed via phpMyAdmin and extracted for pre-processing, followed by DM, in 
which data is stored electronically in existing databases and the search is automated 
or augmented by computers (Witten, Frank, & Hall, 2011, p. 3). An initial analysis 
of the raw dataset supplied 3,141 log files. The data was then prepared for DM. 
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An event log dataset of 2,497 log files was created for the qualitative analysis, 
excluding no event data. For DM of this extensive database, computerised, 
Educational Data Mining (EDM) tools were investigated. However, faced with a 
limited timescale, non-expert computing skills, and the lack of an easy-to-use tool, 
EDM was conducted manually with the computer software Excel. Furthermore, 
the scope of this study’s qualitative analysis was limited to 133 log files. The data 
were restricted to the exercise types DL and DW (see Figure |) as both focus on 
the initial stages of reading development, training visual/aural grapheme-phoneme 
correspondences. 


3. Results and discussion 


This study’s motivation was to investigate “what learners actually do, not what 
the researcher assumes instructions and task demands will lead learners to do” 
(Swain, 1998, p. 80). The results indicate that even though the testing sessions 
were relatively long (averaging 60 minutes), users were actively engaged, spending 
their time almost exclusively on-task. Log files’ workloads record user actions and 
system feedback, thus providing information on how successful users perform in 
specific exercises. The overall success rate for letter drags in DL was unexpectedly 
high (78.81%), however, the users’ performance was only studied for the specific 
skills trained in the exercises and therefore universal statements about learner 
proficiency are impossible to make. Further, the log files revealed that learner 
productivity did not equal learner performance, as the most industrious decoder 
was not the most successful, nor the most successful the most productive. 


Qualitative log-file analysis showed that learners employed various ways to solve 
tasks and all strategies were not equally well-suited for all users. In many instances, 
the lack of successful strategies indicated an inability to learn independently, while 
increased autonomy and decoding proficiency were seen to stem from an increased 
use of efficient strategies. As log files track every single event, they show whether 
and how often learners make use of the provided help tools. In DL users could 
press buttons to listen to letter and word sounds for the words they were decoding, 
in DW they could listen to the words’ letter sounds and were provided with an 
additional helptool, a soundbar (see Figure |). The results indicate a correlation 
between learner proactivity, using the tools independently, and decoding success. 


“Sometimes the absence of activity can be as revealing as its presence” (Bruckman, 
2006, p. 1451), and log files prove that learners do not always do what they are 
expected to, e.g. independently exploring and employing all provided resources 
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(word sets, exercises, help tools). These revelations should be taken into account to 
enhance the CALL application’s design and effectiveness. Additionally, the results 
also emphasise weaker students’ need for more instruction and help regarding 
successful strategies. 


4. Conclusions 


This study has been a challenging, yet rewarding exploration of a new realm, log 
files. In sum, even though manual DM is very time-consuming and cumbersome, 
the results also show that it is not impossible. Nevertheless, this study acknowledges 
that the manual mining procedure applied to the overwhelming abundance of log 
files made the analysis highly prone to human error and possibly weakened the 
scientific rigor of the current study to some degree. Log files provide unique and 
innovative research data, but easy-to-use tools for non-experts are urgently needed 
to benefit from the valuable knowledge hidden away in the computer mines. As 
EDM is a relatively young research field, it remains to be seen when not whether 
EDM can make a contribution to research “in terms of providing tools and 
techniques that educational technology researchers can easily grasp and apply to 
their own research” (Angeli et al., 2017, p. 227). 
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